237 research outputs found

    Query-Driven Sampling for Collective Entity Resolution

    Full text link
    Probabilistic databases play a preeminent role in the processing and management of uncertain data. Recently, many database research efforts have integrated probabilistic models into databases to support tasks such as information extraction and labeling. Many of these efforts are based on batch oriented inference which inhibits a realtime workflow. One important task is entity resolution (ER). ER is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators --- selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions

    Infection and venous thromboembolism in patients undergoing colorectal surgery: what is the relationship?

    Get PDF
    BACKGROUND: There is evidence demonstrating an association between infection and venous thromboembolism. We recently identified this association in the postoperative setting; however, the temporal relationship between infection and venous thromboembolism is not well defined OBJECTIVE: We sought to determine the temporal relationship between venous thromboembolism and postoperative infectious complications in patients undergoing colorectal surgery. DESIGN, SETTING, AND PATIENTS: A retrospective cohort analysis was performed using data for patients undergoing colorectal surgery in the National Surgical Quality Improvement Project 2010 database. MAIN OUTCOME MEASURES: The primary outcome measures were the rate and timing of venous thromboembolism and postoperative infection among patients undergoing colorectal surgery during 30 postoperative days. RESULTS: Of 39,831 patients who underwent colorectal surgery, the overall rate of venous thromboembolism was 2.4% (n = 948); 729 (1.8%) patients were diagnosed with deep vein thrombosis, and 307 (0.77%) patients were diagnosed with pulmonary embolism. Eighty-eight (0.22%) patients were reported as developing both deep vein thrombosis and pulmonary embolism. Following colorectal surgery, the development of a urinary tract infection, pneumonia, organ space surgical site infection, or deep surgical site infection was associated with a significantly increased risk for venous thromboembolism. The majority (52%-85%) of venous thromboembolisms in this population occurred the same day or a median of 3.5 to 8 days following the diagnosis of infection. The approximate relative risk for developing any venous thromboembolism increased each day following the development of each type of infection (range, 0.40%-1.0%) in comparison with patients not developing an infection. LIMITATIONS: We are unable to account for differences in data collection, prophylaxis, and venous thromboembolism surveillance between hospitals in the database. Additionally, there is limited patient follow-up. CONCLUSIONS: These findings of a temporal association between infection and venous thromboembolism suggest a potential early indicator for using certain postoperative infectious complications as clinical warning signs that a patient is more likely to develop venous thromboembolism. Further studies into best practices for prevention are warranted

    Influence of human impact and bedrock differences on the vegetational history of the Insubrian Southern Alps

    Get PDF
    Vegetation history for the study region is reconstructed on the basis of pollen, charcoal and AMS14C investigations of lake sediments from Lago del Segrino (calcareous bedrock) and Lago di Muzzano (siliceous bedrock). Late-glacial forests were characterised byBetula andPinus sylvestris. At the beginning of the Holocene they were replaced by temperate continental forest and shrub communities. A special type of temperate lowland forest, withAbies alba as the most important tree, was present in the period 8300 to 4500 B.P. Subsequently,Fagus, Quercus andAlnus glutinosa were the main forest components andA. alba ceased to be of importance.Castanea sativa andJuglans regia were probably introduced after forest clearance by fire during the first century A.D. On soils derived from siliceous bedrock,C. sativa was already dominant at ca. A.D. 200 (A.D. dates are in calendar years). In limestone areas, however,C. sativa failed to achieve a dominant role. After the introduction ofC. sativa, the main trees were initially oak (Quercus spp.) and later the walnut (Juglans regia). Ostrya carpinifolia became the dominant tree around Lago del Segrino only in the last 100–200 years though it had spread into the area at ca. 5000 cal. B.C. This recent expansion ofOstrya is confirmed at other sites and appears to be controlled by human disturbances involving especially clearance. It is argued that these forests should not be regarded as climax communities. It is suggested that under undisturbed succession they would develop into mixed deciduous forests consisting ofFraxinus excelsior, Tilia, Ulmus, Quercus and Acer

    Composition of Haar Paraproducts: The Random Case

    Full text link
    When is the composition of paraproducts bounded? This is an important, and difficult question, related to to a question of Sarason on composition of Hankel matrices, and the two-weight problem for the Hilbert transform. We consider randomized variants of this question, finding non-classical characterizations, for dyadic paraproducts.Comment: 13 pages. Submitted. v2: \showkeys commented out, with other minor change

    Symmetries and Asymmetries of B -> K* mu+ mu- Decays in the Standard Model and Beyond

    Full text link
    The rare decay B -> K* (-> K pi) mu+ mu- is regarded as one of the crucial channels for B physics as the polarization of the K* allows a precise angular reconstruction resulting in many observables that offer new important tests of the Standard Model and its extensions. These angular observables can be expressed in terms of CP-conserving and CP-violating quantities which we study in terms of the full form factors calculated from QCD sum rules on the light-cone, including QCD factorization corrections. We investigate all observables in the context of the Standard Model and various New Physics models, in particular the Littlest Higgs model with T-parity and various MSSM scenarios, identifying those observables with small to moderate dependence on hadronic quantities and large impact of New Physics. One important result of our studies is that new CP-violating phases will produce clean signals in CP-violating asymmetries. We also identify a number of correlations between various observables which will allow a clear distinction between different New Physics scenarios.Comment: 56 pages, 18 figures, 14 tables. v5: Missing factor in eqs. (3.31-32) and fig. 6 corrected. Minor misprints in eq. (2.10) and table A corrected. Conclusions unchange

    Solvation free energy profile of the SCN- ion across the water-1,2-dichloroethane liquid/liquid interface. A computer simulation study

    Get PDF
    The solvation free energy profile of a single SCN- ion is calculated across the water-1,2-dichloroethane liquid/liquid interface at 298 K by the constraint force method. The obtained results show that the free energy cost of transferring the ion from the aqueous to the organic phase is about 70 kJ/mol, The free energy profile shows a small but clear well at the aqueous side of the interface, in the subsurface region of the water phase, indicating the ability of the SCN- ion to be adsorbed in the close vicinity of the interface. Upon entrance of the SCN- ion to the organic phase a coextraction of the water molecules of its first hydration shell occurs. Accordingly, when it is located at the boundary of the two phases the SCN- ion prefers orientations in which its bulky S atom is located at the aqueous side, and the small N atom, together with its first hydration shell, at the organic side of the interface

    Analysis of Endocrine Disruption in Southern California Coastal Fish Using an Aquatic Multispecies Microarray

    Get PDF
    BackgroundEndocrine disruptors include plasticizers, pesticides, detergents, and pharmaceuticals. Turbot and other flatfish are used to characterize the presence of chemicals in the marine environment. Unfortunately, there are relatively few genes of turbot and other flatfish in GenBank, which limits the use of molecular tools such as microarrays and quantitative reverse-transcriptase polymerase chain reaction (qRT-PCR) to study disruption of endocrine responses in sentinel fish captured by regulatory agencies.ObjectivesWe fabricated a multigene cross-species microarray as a diagnostic tool to screen the effects of environmental chemicals in fish, for which there is minimal genomic information. The array included genes that are involved in the actions of adrenal and sex steroids, thyroid hormone, and xenobiotic responses. This microarray will provide a sensitive tool for screening for the presence of chemicals with adverse effects on endocrine responses in coastal fish species.MethodsWe used a custom multispecies microarray to study gene expression in wild hornyhead turbot (Pleuronichthys verticalis) collected from polluted and clean coastal waters and in laboratory male zebrafish (Danio rerio) after exposure to estradiol and 4-nonylphenol. We measured gene-specific expression in turbot liver by qRT-PCR and correlated it to microarray data.ResultsMicroarray and qRT-PCR analyses of livers from turbot collected from polluted areas revealed altered gene expression profiles compared with those from nonaffected areas.ConclusionsThe agreement between the array data and qRT-PCR analyses validates this multispecies microarray. The microarray measurement of gene expression in zebrafish, which are phylogenetically distant from turbot, indicates that this multispecies microarray will be useful for measuring endocrine responses in other fish
    corecore